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Abstract 

Transposable elements (TEs) are ubiquitous inhabitants of eukaryotic genomes and their proliferation and dispersal shape genome 
architectures and diversity. Nevertheless, TE dynamics are often explored for one species at a time and are rarely considered in 
ecological contexts. Recent work with plant pathogens suggests a link between symbiosis and TE abundance. The genomes of 
pathogenic fungi appear to house an increased abundance of TEs, and TEs are frequently associated with the genes involved in 
symbiosis. To investigate whether this pattern is general, and relevant to mutualistic plant-fungal symbioses, we sequenced the 
genomes of related asymbiotic (AS) and ectomycorrhizal (ECM) Amanita fungi. Using methods developed to interrogate both 
assembled and unassembled sequences, we characterized and quantified TEs across three AS and three ECM species, including 
the AS outgroup Volvariella volvacea. The ECM genomes are characterized by abundant numbers of TEs, an especially prominent 
feature of unassembled sequencing libraries. Increased TE activity in ECM species isalso supported byphylogeneticanalysisofthethree 
most abundant TE superfamilies; phylogenies revealed many radiations within contemporary ECM species. However, the AS species 
Amanita thiersii also houses extensive amplifications of elements, highlighting the influence of additional evolutionary parameters on 
TE abundance. Our analyses provide further evidence for a link between symbioticassociationsamongplantsand fungi, and increased 
TE activity, while highlighting the importance individual species' natural histories may have in shaping genome architecture. 
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Introduction 

Transposable elements (TEs) are autonomously replicating 
pieces of DNA inhabiting the genomes of most life forms. 
The numbers of TEs encoded in species' genomes vary 
widely, but bases coding for TEs often outnumber the pro- 
tein-coding portion of a genome and can be as much as 85% 
of genomic DNA, for example in the maize strain B73 
(Schnable et al. 2009). Because they lack any apparent func- 
tion, TEs have classically been considered as junk DNA or ge- 
nomic parasites (Doolittle and Sapienza 1980; Orgel and Crick 
1980; Hickey 1982). However, during the last decade, ideas 
on the roles of TEs have changed, especially because of the 
increasing numbers of genomic sequences available that have 



highlighted the ability of TEs to generate genomic variation 
(e.g., Kidwell and Lisch 2001; Biemont 2010; Werren 2011; 
Hua-Van et al. 2011; but see McClintock 1983; Finnegan 
1989 for earlier discussions). TEs are now more often de- 
scribed as commensal structural components of a genome, 
which can behave on a spectrum between parasitism and 
mutualism (Kidwell and Lisch 2001). 

Two major classes of TEs can be distinguished, based on 
their modes of proliferation: Class I elements use an RNA- 
intermediate and move via a "copy-and-paste" mechanism. 
They include the long terminal repeat (LTR) elements and the 
long interspersed nuclear elements (LINE) (Finnegan 1989; 
Wicker et al. 2007). Class II elements transpose via DNA 
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intermediates and can be further divided into subclasses de- 
pending on whether they use a "cut-and-paste" mechanism, 
lil<e the terminal inverted repeat elements, or a "copy-and- 
paste" mechanism, for example the Helitrons (Kapitonov and 
Jurka 2001). Intact TEs encode the protein-coding sequences 
required for their proliferation, and upon activation can gen- 
erate tens or hundreds of nearly identical copies that insert 
into new locations in the genome at varying degrees of 
specificity (reviewed in Levin and Moran 2011). By inserting 
themselves into or near coding genes, TEs can create loss of 
function mutations (Nekrutenko and Li 2001), confer new 
regulatory interactions through TE-encoded transcription 
factor binding sites (Jordan et al. 2003) or cause repeat- 
associated silencing of chromosomal neighborhoods 
(Hollister and Gaut 2009). Furthermore, high copy-number 
dispersed repeats can catalyze large-scale genomic rearrange- 
ments including inversions, duplications, deletions, and chro- 
mosomal translocations through recombination of nonallelic 
homologous TE insertions (Sen et al. 2006; Han et al. 2007; 
Robberecht et al. 2013). 

TEs were at first thought to be relatively rare in fungi, 
presumably due to the small numbers found in genetic 
models, such as Saccharomyces cerevisiae and Neurospora 
crassa. However, genome sequencing efforts have revealed a 
wealth of TEs in a large diversity of fungal genomes (Daboussi 
and Capy 2003; Novikova et al. 2009; Muszewska et al. 
2011). Plant pathogens often possess especially large, 
repeat-rich genomes (Raffaele and Kamoun 2012). This 
trend is most evident in biotrophic fungi with narrow host 
ranges, including, for example, the rice blast fungus 
Magnaporthe grisea (Dean et al. 2005), the oilseed rape path- 
ogen Leptosphaeria maculans (van de Wouw et al. 201 0), the 
powdery mildew Blumeria graminis (Spanu et al. 2010), and 
the leaf rust fungi Puccinia graminis and Melampsora larici- 
populina (Duplessis et al. 2011). There are, however, some 
exceptions to the pattern, for example the corn smut 
Ustiiago maydis (Kamper et al. 2006), which has a relatively 
contracted and repeat-poor genome. Effectors, avirulence 
genes and other pathogenicity-related factors often cluster 
in repeat-rich regions and there are numerous examples im- 
plicating TE-mediated mechanisms in the genomic changes 
causing altered virulence or host-specificity (Kang et al. 
2001; Sacristan et al. 2009; van de Wouw et al. 2010; Xue 
et al. 2012). These observations imply that the deleterious 
impacts of TEs may be negligible compared with the benefits 
provided by the increased genome plasticity conferred by TEs 
in the context of a host-pathogen coevolutionary arms race 
(Raffaele and Kamoun 2012). 

The symbiosis of ectomycorrhizal (ECM) fungi and plants is 
also a biotrophic interaction, but functions as a mutualism; 
however, the mechanisms enabling symbiosis may be similar 
across the different kinds of associations (Veneault-Fourrey 
and Martin 2011). An ECM fungus grows with plant roots 
and provides various benefits to the plant in exchange for 



carbon (Smith and Read 2010). When the mutualism is estab- 
lished, gene expression programs are altered to enable the 
fungus to colonize root surfaces and grow between plant 
cells (Martin 2007). The formation of the symbiotic interface 
requires the fungus to communicate with the plant immune 
system, and the fungus may use tools comparable to host 
recognition mechanisms used by pathogens. For example, in 
the symbiosis between the ECM fungus Laccaria bicolor and 
the deciduous broadleaf tree Populus tricliocarpa, an effector- 
like small secreted protein, /W/5SP7, is secreted by the fungus 
and imported into the plant nucleus, where it directly modu- 
lates gene expression (Plett et al. 201 1). 

The genomes of the ECM fungi L. bicolor and Tuber mel- 
anosporum suggest that ECM genomes may also house ele- 
vated numbers of TEs. For example 60% and around 21-24% 
of the T. melanosporum and L. bicolor genomes, respectively, 
constitute TE-derived sequence (Martin et al. 2008, 2010; 
Labbe et al. 2012). ECM fungi coevolving with their hosts 
may experience selective pressures similar to those experi- 
enced by plant pathogens. Like pathogens, ECM fungi are 
obligately dependent on plants and the decline of one host 
species may necessitate the switch to another (Raffaele and 
Kamoun 2012). This dynamic may favor the maintenance of 
genome plasticity (Martin and Selosse 2008; Veneault-Fourrey 
and Martin 2011). However, a key assumption of the host- 
pathogen coevolutionary arms race model (Raffaele and 
Kamoun 2012) does not hold; in contrast to most biotrophic 
pathogens, many ECM fungi are generalists (Bruns and 
Bidartondo 2002; Kennedy et al. 2003; but see Smith et al. 
2009) and an individual fungus associates with multiple trees 
(Horton and Bruns 2001; Saan et al. 2005). 

Our current understanding of TE dynamics in ECM fungi is 
patchy and largely limited to comparisons between a small 
number of species (Labbe et al. 201 2) or over large evolution- 
ary distances (Novikova et al. 2009; Muszewska et al. 201 1), 
making it difficult to comment on potential mechanisms shap- 
ing TE content. To investigate TE content evolution in ECM 
fungi at a finer resolution, we sequenced the genomes of five 
species of fungi within the genus Amanita, as well as the 
asymbiotic (AS) outgroup Volvariella volvacea. The genus 
Amanita encompasses more than 500 species, including the 
charismatic A. muscaria (often depicted in fairy tales) and the 
deadly poisonous death cap, A. phalloides. The genus is found 
on all continents and houses both ECM and free-living fungi. 
The number of symbiotic species, which associate with a di- 
versity of plants, is far greater than the number of AS species. 
Furthermore, the AS Amanita have recently been shown to 
form a monophyletic clade basal to the ECM Amanita, sup- 
porting a single origin of ECM symbiosis within this genus 
(Wolfe, Tulloss, et al. 2012). We chose to sequence one rep- 
resentative from each of three large ECM clades: A. brunnes- 
cens, A. polypyramis and A. muscaria var. guessowii, as well as 
the AS species A. thiersii and A. inopinata. We developed 
analytical approaches to characterize and quantify TE content 
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by combining assembly-based and assembly-free methods. 
Tlie latter technique addresses the issue of underrepresenta- 
tion of repeats in de novo assemblies derived from short 
sequencing reads (fig. 1) (Alkan et al. 201 1). 

We found ECM genomes to house elevated TE contents 
compared with A. inopinata and the outgroup V. volvacea, 
especially after consideration of unassembled reads. Results 
mirror the phylogenetic analyses of TE families, where large 
amplifications of TEs are found in ECM species. But, the AS 
species A. thiersii also houses a large number of TEs that have 
recently expanded. 

Materials and Methods 

Fungal Strains and DNA Extraction 

Sources and cultures of Amanita and the outgroup species 
are described in table 1 . Cultures were maintained on solid 
modified MMN medium (0.5 ml/1 CaC^xZHzO], 0.5 ml/I 
FeCl2[x6H20], 1ml/l NaCI, 1 ml/1 MgS04[x7H20], 5ml/l 
[NH4]2HP04, 10 ml/1 KH2PO4, 2g/l malt extract, 5g/l potato 
dextrose broth, 5 g/l dextrose, 2 g/l cellobiose, 2 g/l polypep- 
tone peptone, and 1 g/l yeast extract) with the addition of 
lOOx BME vitamins (MP Biomedicals, Santa Ana, CA) and 
antibiotics (150mg/l streptomycin, 150mg/l penicillin). For 
DNA extraction, fungi were grown on liquid modified MMN 
medium and incubated in the dark at 27°C for 2 weeks prior 
to harvesting. Harvested mycelia were ground in liquid nitro- 
gen and extracted as described below. 

Amanita thiersii DNA was extracted using the Qiagen ge- 
nomic tip extraction protocol as per manufacturers' instruc- 
tions (Qiagen, Valencia, CA). DNA from the additional species 
was extracted using the "Phytophtora genomic DNA" phenol/ 
chloroform protocol available from JGI (http://jgi.doe.gov/col- 
laborate-with-jgi/pmo-overview/protocols-sample-preparation- 
information/, last accessed June 17, 2014). Following 
extraction, all samples were cleaned using Qiagen Genomic- 
tip 100/G columns, according to the manufacturers' protocols 
and starting after the DNA isolation step (Qiagen, Valencia, 
CA). Quantity and quality of the samples were assessed using 
an Agilent 2100 Bioanalyzer. 

Sequencing and Assembly of JGI Genomes 

The A. thiersii genome was sequenced using the Roche 454 
and lllumina platforms including one 454 Rapid library, one 
4-kb 454 paired-end library and one 2 x 76 3-kb lllumina 
paired-end library. An initial assembly of the lllumina data 
was generated using Velvet (Zerbino and Birney 2008), fol- 
lowed by a Newbler assembly of the resulting contigs together 
with the 454 libraries (-fe reads2remove -info -ace -qo -sio -a 
50 -I 350 -g -ml 30 -mi 97). This resulted in a 45 x coverage 
assembly with 2,370 scaffolds, 36-kb scaffold N50, 37.2-Mb 
total scaffold, 5,959 contigs, 21.8-kb contig N50, and 
39.4-Mb total contig. Allpaths fragment and jumping libraries 



were simulated from the Newbler contigs using wgsim 
(Li et al. 2009) with the following options: -e 0 -d 4000 -N 
45000000 -1 100 -2 100 -r 0 -R 0 -X 0. The simulated and 
lllumina data were subsequently assembled with AIIPathsLG 
release version R38445 (Gnerre et al. 201 1), resulting in the 
assembly detailed in table 2. 

The A. muscaria var. guessowii genome was se- 
quenced using the lllumina platform with one 2x100 
3.5 kb lllumina long fragment paired-end library, one 
2x 1003-kb lllumina paired-end unamplified library and 
one 2 x 15027-kb lllumina paired-end unamplified library. 
Each fastq file was QC filtered for artifacJ/process contamina- 
tion and subsequently assembled with AIIPathsLG release ver- 
sion R42328 with HAPLOIDIFY = True (Gnerre et al. 2011), 
resulting in the assembly detailed in table 2. 

Sequencing and Assembly of Additional Genomes 

We sequenced a single lane of lllumina reads for each of the 
additional species as well as an independent replicate of the 
A. muscaria genome. Paired-end libraries of 300-bp total frag- 
ment size were prepared at the Harvard Biopolymers facility 
(www.genome.med.harvard.edu, last accessed June 17, 
2014) using the lllumina TruSeq gDNA protocol (lllumina, 
Cambridge, UK) and sequenced to 100 bp on an lllumina 
HiSeq2000 instrument. The raw read data were preprocessed 
using Trimmomatic v.0.22 (Lohse et al. 2012) to remove any 
residual sequencing adapters and low quality sequences. 
Leading and trailing bases with quality scores less than Q28 
were trimmed and a sliding window analysis across 5-bp win- 
dows was used to eliminate reads when the average quality 
dropped below Q18. After adapter removal and low-quality 
trimming, any sequences shorter than 50 bp were removed 
from each data set. 

The trimmed libraries were assembled using ABySS v.1 .3.3 
(Simpson et al. 2009) with the following parameters: 7 = 8, 
5=200-5,000, /=(k-mer - 20) and n=10 for all k-mer 
values between 33 and 89. Contiguity statistics (longest scaf- 
fold and N50), were calculated for each assembly after any 
scaffolds shorter than 200 bp were removed. We also scored 
different assemblies for completeness and redundancy by 
probing for core eukaryotic genes using CEGMA (Parra et al. 
2007). Final assemblies were chosen to maximize contiguity 
and completeness while minimizing redundancy (table 2). 

TE Identification and Classification 

TEs were identified using a combination of homology-based 
methods, de novo detection of overrepresented sequences, 
and structure-based approaches. We first screened the 
genome assemblies for TE-derived sequences using tBLASTX 
v.2.2.25 + (Gish and States 1993) with translated protein- 
coding sequences from Repbase v. 17.08 (Jurka et al. 2005). 
The search was run without sequence filtering at an e value 
threshold of 10"^^. In addition to tBLASTX searches we ran 
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Fig. 1. — ^The challenges associated with estiinating TE content from assemblies generated using short read data. (A) Assemblers cannot disambiguate 
reads from different locations and so collapse nearly identical repeats, often causing breakpoints in the assembly. (6) TE regions (green) on the Amanita 
poiypyramis contig in the bottom panel show greatly increased coverage compared with the rest of the contig and the contig containing housekeeping 
(CEGMA) genes (blue, top panel), evidence of collapsed repeats. (0 Example of genome-wide coverage data for Voivarieiia volvacea (AS) and A. poiypyramis 
(ECM). Gray points correspond to CEGMA genes and the points for transposable elements are colored by superfamily (see fig. 2). In V. voivacea JE coverage 
is within range of CEGMA coverage, whereas a large increase in the coverage of various elements, including for example Gypsy elements (blue), is visible in 
the A. poiypyramis data. 
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Table 1 

Fungal Strains 



Species 


Strain 


Collector 


Provenance 


Date Collected 


Niche 


Habitat 


Amanita brunnescens 


Koide BX004 


R. Koide 


Haugh West, Pennsylvania 


August 2003 


ECM 


With red pine 


A. polypyramis 


BW_CC 


B. Wolfe 


Cape Cod, Massachusetts 


October 2007 


ECM 


Mixed oak and 






(through Boston 








pine forest 






Mycological Club 










A. muscaria" 


Koide BX008 


R. Koide 


Haugh West, Pennsylvania 


August 2003 


ECM 


With red pine 


A. inopinata 


Kibby_2008 


G. Kibby and B. Wolfe 


Suffolk, United Kingdom 


October 2008 


AS 


At edge of pasture 


A. thiersil° 


Skay4041 


S. Kay 


Baldwin City, Kansas 


2009 


AS 


Lawn 


Volvariella volvacea 


PS #WC 439 


Penn State 


China 


1984 


AS 


Unknown 






Spawn Collection 











Note. — ECM, ectomycorrhizal. 

^Amanita muscaria is a name used for a species complex (GemI et al. 2008); strain Koide BX008 is A. muscaria var. guessowii {www.amanitaceae.org, last accessed June 
17, 2014). 

^/Volfe, Kuo, et al. (2012). 



Table 2 

Draft Genome Assemblies 





ECM 


ECM 


ECM 


ECM 


AS 


AS 


AS 




Amanita brunnescens 


A. polypyramis 


A. muscaria 
JGI 


A. muscaria 


A. inopinata 


A. thiersii 
JGI 


Volvariella volvacea 


Total assembly size (Mb) 


57.6 


23.5 


40.7 


67.6 


22.1 


33.7 


52.4 


Ploidy 


Dikaryon 


Dikaryon 


Dikaryon 


Dikaryon 


Dikaryon 


Monokaryon 


Dikaryon 


Assembler 


ABySS 


ABySS 


AllpathsLG 


ABySS 


ABySS 


AllpathsLG 


ABySS 


Number of scaffolds 


17,039 


5,295 


1,011 


17,516 


5,912 


1,446 


4,019 


Longest scaffold (kb) 


497.0 


384.1 


1,491.6 


158.6 


2,165.3 


1,038.0 


1,066.4 


Scaffold N50 (kb) 


11.0 


61.2 


168.1 


12.1 


156.2 


77.0 


54.6 


Number of contigs 


24,844 


6,690 


3,814 


24,994 


6,157 


2,164 


6,360 


Longest contig (kb) 


260.6 


384.2 


508.8 


158.6 


2,081.7 


1,038.0 


719.7 


Contig N50(kb) 


8.6 


48.5 


30.1 


10.5 


86.6 


60.4 


44.0 


CEGMA completeness % 


94.6 


95.6 


92.3 


92.3 


96.0 


96.0 


95.6 


CEGMA redundancy 


1.8 


1.3 


1.1 


2.9 


1.1 


1.1 


1.7 



Note. — Summary statistics of the draft genome assemblies generated for each species. Columns marked "JGI" highlight genomes assembled by DOE-JGI. ECM and AS 
refer to ectomycorrhizal and asymbiotic ecology, respectively. Percentages of CEGMA core eukaryotic genes (Parra et al. 2007) recovered in each assembly were used as 
estimates of gene space completeness. CEGMA redundancy is the average copy number of single copy CEGMA genes detected in each genome and serves as an indicator of 
the amount of heterozygosity in an assembly. 



the BLASTER suite (Quesneville et al. 2003) for de novo detec- 
tion as well as LTRHarvest (Ellinghaus et al. 2008) for structure- 
based detection of TEs. The results of all three searches were 
fed into the REPET TEdenovo pipeline (Flutre et al. 201 1) that 
we nnodified to run on an LSF cluster. Briefly, TEdenovo uses 
the programs Piler (Edgar and Myers 2003), GROUPER 
(Quesneville et al. 2003), and RECON (Bao and Eddy 2002) 
to cluster the TEs identified by the different methods and re- 
construct a consensus for each group of matches. The Python 
scripts we developed for pipelining elements of the REPET 
pipeline on an LSF cluster are available on request from the 
corresponding author. 

The reconstructed TE consensus sequences were dedupli- 
cated and classified into class, order, and superfamily using 
the REPET TEdassifier (Flutre et al. 201 1). TEclassifier is based 
on matches with Repbase, the presence of key Pfam (Finn 
et al. 2006) domains (e.g., reverse transcriptase or transposase 



domains), and structural features such as long-terminal re- 
peats or target site duplications. Clustering cutoffs for consol- 
idating individual elements were set at 95% identity over 98% 
of the element length as those were determined to be the 
optimal parameters for a low redundancy database of TEs 
(Flutre et al. 201 1). The automatic assignments were manually 
assessed to remove false positives and spurious matches and 
to resolve conflicting annotations. The fragmented and repet- 
itive nature of our genome assemblies (table 2) has the po- 
tential to cause inflated numbers of false positive matches 
in de novo searches, and so we decided on the following 
stringent filtering criteria: A TE was only retained if it had a 
significant BLAST match (<10"®) with an element in Repbase 
or contained a TE-derived Pfam domain (as defined by the 
REPET-curated Pfam library). Any matches that had a signifi- 
cant hit (< 10"^) to a non-TE Pfam domain were removed 
from the library. 
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For the final annotation of TEs in each of the genonnes, 
we connbined all reconstructed elements into a single library 
and used it as an input library for RepeatMasker v. 3.30 
(Smitetal. 2010). RepeatMasker was run using an alignment 
cutoff of 250 (-cutoff 250) and sensitive search (-s). The TE 
locations identified by RepeatMasker were deduplicated 
using MATCHER from the BLASTER package (Quesneville 
et al. 2003), and we retained only the match with highest 
sequence identity in cases of overlapping annotations. This 
nonredundant set of TE annotations was used for all further 
analyses. 

Coverage-Based Quantification of TEs 

Genome assemblies based on short-read sequencing data 
commonly suffer from an underrepresentation of repeated 
sequences (Alkan et al. 201 1 ; fig. 1). As the majority of our as- 
semblies are based on lllumina short-read libraries we sought 
to specifically target this issue and provide a different perspec- 
tive by calculating TE content from the unassembled libraries 
using a depth-of-coverage approach. First, we assume an ap- 
proximately even sequencing coverage across each genome. 
By comparing the sequencing depth of TE sequences to se- 
quencing depth of unique genomic sequences, we calculate 
a metric enabling us to estimate the entire TE content of a li- 
brary, both ancient TEs and relatively more recent, undiverged 
TEs. 

This relative coverage for TE regions was calculated by first 
aligning our lllumina gDNA libraries to their respective assem- 
blies. In the analysis of A. thiersii, we used a 76-bp paired-end 
library generated by the JGI available in SRA under accession 
number SRR065573. Reads were aligned using Bowtie 2 
(Langmead and Salzberg 2012) in end-to-end alignment 
mode, reporting only the best match for each read. 
Fragment counts for all genomic regions were calculated 
using HTSeq-count (wvvw-huber.embl.de/users/anders/ 
HTSeq/, last accessed June 17, 2014), discarding reads that 
map to multiple features. TE regions were scored using the 
deduplicated RepeatMasker annotations to count the number 
of fragments by repeat ID, meaning that if a TE was found in 
multiple genomic locations, total counts for a repeat ID can 
reflect read counts consolidated over several different scaf- 
folds. Coverage of the CEGMA gene regions was calculated 
accordingly, taking into account all reads mapping between 
the start of the first and end of the last exon, including introns. 
To alleviate mapping artifacts due to the intrinsically repetitive 
nature of TE sequences we decided to calculate the approxi- 
mate TE copy number at the superfamily-level, on the basis of 
different superfamilies being sufficiently divergent to avoid 
unspecific mapping. A scaling factor St for each superfamily 
was estimated as the ratio of the sum of fragments mapped 
per kilobase per million reads aligned (FPKM) of all target 
repeat IDs belonging to a superfamily over the median 
FPKM of all CEGMA genes. The corrected TE content 



estimates for each superfamily were calculated by scaling 
the assembled TE content by its scaling factor Sp 

TE Fannily Clustering, Prediction of Protein-Coding 
Regions, and Phylogenetic Analysis 

Clustering of elements into TE families was performed using 
USEARCH V. 5.0.144 (Edgar 2010) with the parameters -id 
0.8 -queryfract 0.8 -rev -maxrejects 128, choosing the lon- 
gest element for each family as the representative sequence. 
Annotations for all TEs were updated to reflect the lowest 
level of classification shared between the members of a 
given family. 

We first predicted protein-coding sequences for all repeat 
IDs using Genewise (Birney et al. 2004) with the amino acid 
sequences of the five best BLASTX matches in Repbase as 
targets and allowing for the inclusion of stop codons. In 
some cases, the annotated TEs do not span the entire pro- 
tein-coding sequence, especially in regions where TEs are 
nested or in close proximity to one another (data not 
shown). To obtain the most complete possible set of TE- 
derived protein-coding sequences, we therefore included a 
second search, using the protein-coding sequences predicted 
from the repeat IDs to identify TE protein-coding sequences in 
the genome assemblies directly. We screened each assembly 
against the predicted TE proteins using BLASTX with an e 
value cutoff of 10^^^. Scaffold fragments encompassing the 
candidate locations plus an additional 500-bp upstream and 
downstream were excised from the assemblies and fed into 
Genewise, together with the matching query sequences to 
obtain individual protein predictions for each TEs (as above). 

For the phylogenetic analyses of our three target element 
superfamilies (Copia, Gypsy, and LINE), amino acid sequences 
belonging to each superfamily were aligned using an iterative 
approach. We first aligned sequences of at least 500 amino 
acids, as those are expected to yield better alignments. 
Alignments were run using PAGAN (Loytynoja et al. 2012), 
a phylogeny-aware aligner. To improve alignments, we calcu- 
lated ML guide trees from the first alignments using R/\xML v. 
7.7.5 (Stamatakis 2006) with a WAG+r model, and then 
repeated alignments with the new guide trees. 

PAGAN also implements a guided placement algorithm 
that can align shorter sequence fragments into existing align- 
ments of full-length sequences. We used this feature to align 
predicted proteins that were shorter than 500 amino acids 
into the full-length TE superfamily alignments. Sequences 
shorter than 100 amino acids were omitted from analyses 
as those tended to align poorly even in a guided alignment 
(data not shown). Starting from the root of the ML guide tree, 
we tagged the deepest nodes containing only elements from 
the same species with the name of that species. Each frag- 
ment was then aligned into the best-fitting node for its spe- 
cies. To avoid disjoint alignments of short sequences spanning 
different domains, we removed all fragments that did not 
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overlap the reverse transcriptase region by at least 25%. 
Finally, weakly aligning regions were trimmed from align- 
ments using trimAI (Capella-Gutierrez et al. 2009) with the 
following parameters: -gt 0. 1 . The resulting amino acid align- 
ments contained 1,168 positions in 1,071 sequences (LINE), 
1 ,289 positions in 330 sequences (Copia), and 1 ,287 positions 
in 1 ,229 sequences (Gypsy). 

We determined the best-fit model for amino acid analyses 
using ProtTest 3.2 (Guindon and Gascuel 2003; Darriba et al. 
201 1). The JTT model of evolution (Jones et al. 1992) with 
r-distributed rates (+r) and empirical amino acid frequencies 
(+F) performed best for all three superfamilies independent of 
the selection criterion. Amino acid trees were calculated using 
RAxMLv. 7.7.5 (Stamatakis 2006; Stamatakisetal. 2008) with 
the JTT+r+F model. Bootstrapping (BS) analyses for each tree 
were performed using the fast BS algorithm implemented 
in RAxML (-f a), with an automated stopping criterion 
(-autoMRE). BS runs stopped after 350 replicates in the case 
of LINE and Copia and 450 for the Gypsy alignment. 
Ultrametric trees were estimated from the ML trees using 
PATHd8 (Britton et al. 2007) and rooted with the V. volvacea 
outgroup that minimizes duplications and losses as deter- 
mined using Notung 2.6 (Chen et al. 2000) with default 
parameters. 

Results 

Draft Genomes 

We sequenced the genomes of the ECM fungi A. brunnes- 
cens, A. polypyramis, and A. muscaria var. guessowii (hereaf- 
ter referred to simply as A. muscaria), the closely related 
saprotrophs A. inopinata and A. thiersii, and the more 
distantly related outgroup V. volvacea. Sequencing and as- 
sembly of A. thiersii and A. muscaria were completed as 
part of the Department of Energy Joint Genome Institute 
(JGI) Community Sequencing Programs (CSP# 402019 and 
403202, respectively) and were based on multiple libraries 
of short- and long-range paired-end lllumina reads, plus addi- 
tional A. tliiersii 454 libraries. The draft genomes of all other 
species, as well as a replicate of the A. muscaria genome, were 
sequenced and assembled using single short-range PE lllumina 
libraries (table 2). 

De novo assembly from single lllumina libraries proved a 
successful strategy for reconstructing gene space, and on av- 
erage 95% of conserved eukaryotic (CEGMA) genes were 
recovered from each genome (table 2). The numbers of 
CEGMA genes found in single-library assemblies are compa- 
rable to those recovered from the multilibrary JGI assemblies 
although, not surprisingly, the single-library assemblies are 
considerably more fragmented. This point is illustrated in a 
direct comparison between the two A. muscaria assemblies 
(table 2). The same CEGMA genes are present in both assem- 
blies despite the greatly different levels of fragmentation: 



Scaffold N50 was 168kb in the JGI assembly, compared 
with 12kb in the single-library assembly. We also see an 
increased level of redundancy in some of the single-library 
assemblies, which we interpret as a reflection of the inability 
of the assembler to distinguish whether two highly similar 
genomic regions arose from a recent duplication, or constitute 
the two heterozygous haplotypes of the region in a diploid 
genome. The level of redundancy may thus serve as an indi- 
cator of the heterozygosity found in the respective dikarya. 
Redundancy is most pronounced in the A. muscaria, A. brun- 
nescens, and V. volvacea assemblies. The A. muscaria single- 
library assembly has an average copy number of 2.9 for each 
CEGMA gene, compared with 1.1 in the JGI assembly. The 
A. brunnescens and V. volvacea assemblies are both ap- 
proaching an average copy number of 2. Thus, the relatively 
larger assembly size for these species may be explained by 
heterozygosity in these diploid fungi, and the assembly of 
different alleles onto different contigs, rather than by exten- 
sive genome expansion. This is supported by the recent pub- 
lication of a monokaryotic V. volvacea genome sequence with 
a total assembly size of 35.7 Mb (Bao et al. 2013), which 
compares with the 52.4 Mb of our dikaryon assembly in a 
proportion that is similar to the estimated CEGMA redun- 
dancy (1 .5). Our current focus is to quantify TE content, and 
not to compare protein-coding genes, and we do not attempt 
gene prediction beyond the CEGMA genes. Future publica- 
tions will more formally compare the gene content of the 
different species. 

TE Prediction and Quantification Based on Assemblies 

TEs were predicted from assembled genomes in two steps: 
First, we identified and reconstructed consensus elements in 
each assembly following the first part of the REPET pipeline 
(Flutre et al. 201 1). The resulting single-species libraries were 
combined into an aggregate TE library (supplementary table 
SI and data file SI, Supplementary Material online), and al- 
though it includes elements found in V. volvacea, for simplicity 
we refer to it as the "Amanita TE library" hereafter. 
Consensus elements were classified using the REPET classifier 
and manually filtered to remove individual elements where 
there was no direct evidence for identity as a TE (see 
Materials and Methods for details). Our approach risks dis- 
carding previously uncharacterized types of TEs, but with 
the limitations of our data in mind, we focused on tracking 
the dynamics of known families of TEs rather than exhaus- 
tively describing the complete set of TEs in any particular 
genome. For this reason, we also avoided a kmer-based 
analysis of repeat content. 

The final Amanita TE library consists of 7,376 consensus 
elements belonging to 1 6 different superfamilies and includes 
all of the orders of TEs described in Wicker et al. (2007), with 
the exception of Crypton elements (supplementary fig. SI and 
table SI, Supplementary Material online). A large proportion 



1 570 Genome Biol. Evol. 6(7): 1564-1 578. doi:10.1093/gbe/evu121 Advance Access publication June 12, 2014 



TE Dynamics in Amanita Fungi 



GBE 



□ Non-TE bases 




B 



Assembled TE bases EH Unassembled TE bases 

'WZZ 



A. brunnescens 



B 



A. polypyramis 
A. muscaria (JGI) 
A. muscaria 
A. inopinata 
A. thiersii 

V. volvacea 



t 



15 

L_ 



J L 



I 

0 



I L 



10 

_l_ 



20 



_L 



30 

_l_ 



40 



Assembled TE content (MB) 



Classl/LINE 
Classl/LTR/Gypsy 



Estimated TE content from 
unassembled libraries (MB) 

Classl/LTR/Copia ■ Classll/TIR/Tc1-Mariner ■ NoCat 
Classl/Other □ ClassI I/Other 
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of the reconstructed TEs belong to the Gypsy and Copia 
superfamilies of LTR retroelements (51% and 18%, respec- 
tively), as is commonly found across the fungi (Daboussi and 
Capy2003; Muszewska etal. 201 1). Another large proportion 
of consensus elements (15%) belong to the LINE non-LTR 
retroelements. Together, class I elements make up over 
80% of the Amanita TE library whereas a diversity of class II 
DNA tranposons only makes up about 1 5 % of the library. 
Clustering elements into families according to the "80-80- 
80" rule (80% of nucleotide identity over 80% of the 
sequence for at least 80 bp; Wicker et al. 2007) revealed 
3,204 families with 2.3 members on average (supplementary 
table SI, Supplementary Material online). 

The second step of our protocol used RepeatMasker (Smit 
et al. 201 0) and the Amanita TE library to identify the location 
of individual repeats in each of our genome assemblies. 
Genomic regions that were annotated with more than one 
element were deduplicated, keeping only the best TE match 
(supplementary tables S2-S8, Supplementary Material online). 
Proportions of TEs found in draft assemblies varied from 
around 5% in A. inopinata and V. voivacea to 26% in 
A. thiersii {f\g. 2A). Despite considerable differences in overall 
TE content, all of the species house a diverse set of TEs span- 
ning most major superfamilies, although there are also low 
frequency repeats, for example the Maverick and Penelope 
elements, which show a more patchy distribution 



(supplementary fig. SI, Supplementary Material online). 
Generally, TE content in each genome mirrors the composition 
of the consensus library, with Gypsy and Copia superfamilies 
dominating TE populations. A large expansion of LINE is ap- 
parent in the genome of A. brunnescens, and to a lesser 
degree is also visible in its closest relative, A. polypyramis. A 
similar expansion, but of Gypsy elements, is evident in A. 
thiersii. Although the diversity (presence or absence) of ele- 
ments is similar across all species, the relative frequencies of 
individual TE superfamilies are highly variable and show dis- 
tinct amplification profiles. 

TE Quantification from Unassembled Libraries 

A pitfall of whole genome shotgun (WGS) sequencing is the 
inability to accurately resolve nearly identical repeats in these 
data (Alkan et al. 201 1; fig. 1). Read lengths and short-range 
library sizes are often shorter than an average TE, resulting in 
the superposition of TEs and other recently duplicated regions 
in WGS assemblies (fig. \A). The median consensus length of 
complete elements reconstructed in A. thiersii, the only assem- 
bly in which we could identify a sizeable number of complete 
consensus elements, is 6,583 bp. That length is far larger than 
the 300-bp fragment size libraries used to sequence and as- 
semble the single-library genomes. TE content estimates 
based on assembled draft genomes (fig. 2A) are likely to rep- 
resent lower bounds. Estimates may also be biased toward 
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more ancient TE insertions, which would have had time to 
accumulate mutations and will more easily resolve into sepa- 
rate scaffolds. Moreover, genome assemblies derived from 
diploid fungi will vary in the degree to which TE insertions 
that are present in both haplotypes have been assembled 
onto the same or different scaffolds. Heterozygous copies of 
the same TE insertion in a diploid genome may assemble onto 
different scaffolds. The degree to which this happens is un- 
known, but is likely to be different for each species. TE esti- 
mates from assembled content are not likely to be directly 
comparable (table 2). 

Protocols to characterize TE content from raw sequencing 
libraries may obviate these issues and have been used effec- 
tively with plant genomes (Tenaillon et al. 2011; Hertweck 
2013; Senerchia et al. 2013). To establish a different and per- 
haps more realistic picture of TE content, one that is compa- 
rable across species, we turned to the unassembled libraries 
and developed a sequencing coverage-based method to re- 
estimate the amount of TEs present in each genome (see 
Materials and Methods). 

Our approach identified many TEs not found within the 
assembled genomes, confirming the presence of collapsed 
TE sequences in our assemblies and providing a different 
perspective on TE content across the phylogeny (fig. 26). 
We found particularly large amounts of unassembled TEs in 
A. brunnescens and A. polypyramis, increasing the overall TE 
content estimated in these species to 36% and 59%, respec- 
tively. Although many different types of unassembled TEs are 
found in the genome of A. brunnescens, a distinct amplifica- 
tion of Gypsy elements is found in A. polypyramis. This ampli- 
fication was already apparent in the raw coverage data 
(fig. 1Q. Remaining species house moderate amounts of 
unassembled TEs, with the exception of V. volvacea, where 
coverage of TE regions tends to be lower than that of unique 
genomic sequence. This is likely an effect of ploidy; although 
the majority of CEGMA genes appear to be present as a single 
haplotype, and thus are mapped at higher coverage, the bulk 
of the TE regions appear to be present as either two haplo- 
types or only present on one of the chromosomes, and so are 
mapped at half the coverage (fig. 1 0- 

Phylogenetic Analyses 

To provide a phylogenetic perspective on our comparative 
data, and document patterns of amplification and loss of TE 
families, we analyzed the assembled portion of our TE reper- 
toires in a phylogenetic framework. Protein sequences span- 
ning the reverse transcriptase domains of the three largest 
superfamilies (Copia, LINE and Gypsy) were predicted from 
the genome assemblies, aligned and used to estimate maxi- 
mum-likelihood (ML) phylogenies. Ultrametric trees for each 
superfamily were derived from ML trees by running a mean 
path length method (Britton et al. 2007). 



The three superfamilies show contrasting phylogenetic pat- 
terns (fig. 3). The most pronounced differences are in the age 
distributions of the TE copies. Around half of the Copia ele- 
ments belong to deep clades containing small numbers of 
elements from multiple species. The largest expansion is 
found in A. thiersii with 85 extant elements. In contrast, 
around 80% of LINE and Gypsy elements are part of young, 
species-specific clades, often encompassing hundreds of ele- 
ments, for example the /4. brunnescens expansion in LINE (699 
elements) or the A. thiersii expansion in Gypsy elements (494 
elements). These patterns imply that many of the Copia 
elements found in our genomes are derived from ancient 
amplifications, and that there was comparatively little recent 
activity, whereas the LINE and Gypsy superfamilies are char- 
acterized by abundant recent amplifications. 

The phylogenetic data mirror patterns suggested by the 
comparative analysis of assembled TE content (fig. 2A). 
Amanita ttiiersii. the species with the highest assembled TE 
content, shows amplifications in all three superfamilies (fig. 3, 
blue clades). The most prominent amplification is found among 
Gypsy elements, where 494 elements (about 40% of the Gypsy 
elements analyzed) fall into a single A. thiersii-speafic clade, 
whereas the A. tfiiersii clades among LINE and Copia amplifi- 
cations are smaller (7 1 and 85 elements, respectively). Similarly, 
the large increase in the numbers of LINE seen \nA. brunnescens 
and A. polypyramis reflects amplifications in these species (fig. 
3, green and orange clades, respectively). Amanita brunnes- 
cens houses the largest clade with 699 elements, whereas A. 
polypyramis LINE have expanded in two separate clades con- 
taining 108 and 91 elements, respectively. Although A. brun- 
nescens and A. polypyramis are close relatives and a common 
origin of the amplified LINEs seems plausible, our phylogenetic 
data suggest independent amplifications in A brunnescens and 
A. polypyramis. The elements fall into distinct, strongly sup- 
ported clades with bootstrap values between 97 and 1 00. 

Gypsy elements show the most diverse patterns of TE ac- 
tivity. Species-specific amplifications are evident for all species, 
suggesting recent activity of Gypsy elements across the genus. 
We are able to distinguish at least five deep clades that predate 
the divergence of V. volvacea and the genus /4man/fa. TE am- 
plifications are concentrated in two of these clades, marked 
clade A and clade B (fig. 3). Apart from a smaller amplification 
in V. volvacea (45 elements), clade A is dominated by ECM 
species which contribute 84% of the 356 extant elements. 
Within clade A we find three well-supported lineages that 
date to at least the base of the ECM species. Clade B houses 
TEs from a more diverse set of species and contains the large A. 
thiersii amplification discussed above, as well as a sizeable A. 
brunnescens amplification (110 elements). 

TE Amplification and ECM Ecology 

Our different analyses provide distinct perspectives on TE pro- 
liferation and abundance in symbiotic fungi. Analyses based on 
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Fig. 3. — ML phylogenies of the predicted protein sequences of tlie tliree largest TE superfamilies. Brandies are colored according to the 
species phylogeny shown bottom left (Wolfe, Tulloss, et al. 2012). Nodes near the root are marked according to their bootstrap support (circle: 70-90, 
filled circle: > 90). 



assembled genonnes suggest the AS, decomposer fungus A. 
f/?/ers;/ as the species with the greatest proportion of TEs relative 
to coding sequence (TEs are 26% of the genome, fig. 2A), and 
although the genome of the ECM species A. brunnescens is 
also rich in TEs (18% of the genome), the ECM species A. 
polypyramis and A. muscaria house relatively modest propor- 
tions of repeats (1 1 % and 9%, respectively). However, both A. 
polypyramis and A. muscaria house around twice as many TEs 
than either of the AS species/^, inopinata or V. volvacea (5% in 
both species). Analyses based on unassembled genomes reveal 
a complementary pattern. Estimates of TE content in the ECM 
species are between two and five times greater than estimates 
based on assembled content (36% in A. brunnescens, 59% in 
A. polypyramis, and 22% in A. muscaria). The proportions of 
unassembled TE content found in the AS species were generally 
smaller, with almost no change in V. volvacea (5% total con- 
tent), and about one and a half times as much in A. inopinata 



and A. thiersii (8% and 36% total TE content, respectively). 
Data suggest an excess of young, unassembled TE copies in 
several species, and most obviously in the ECM species. 

All three superfamily phylogenies, but especially those of 
LINE and Gypsy elements (fig. 3) show the hallmarks of TE 
expansions in ECM species. By contrast, amplifications in 
either A. inopinata or V. volvacea are relatively modest and 
less frequent. Phylogenetic data suggest that different clades 
of TEs may have amplified independently in different 
ECM species, for example among LINE where the large 
A. brunnescens amplification groups with smaller clades 
from A. muscaria, A. thiersii and A. Inopinata, rather than 
with the amplifications in its closest relative A. polypyramis. 
Amanita brunnescens and A. muscaria elements are also 
abundant among the TEs retained over longer evolutionary 
distances, as evident from their ample presence in the 
deeply divergent clades of the Copia and Gypsy superfamilies. 
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The pattern of increased retention may point toward lower 
rates of TE loss in these ECM species. 

Nevertheless, ECM species are not the only species housing 
TE expansions. The saprotroph A. thiersii is a species with a 
high proportion of TEs in the genome, and expansions of all 
three superfamilies are apparent. 

Discussion 

Methodological Aspects 

Short-read sequencing has rapidly emerged as a widely used 
method for the study of genome evolution. The decreased 
cost of sequencing coupled with advances in bioinformatics 
has resulted in a growing understanding of the mechanisms 
shaping the evolution of gene content and regulation from 
broad phylogenetic scales to the fine-grained resolution of 
populations. Although most analyses are focused specifically 
on gene space in the wider sense (including genes and non- 
coding regulatory sequences), TEs, which can play a major role 
in the reshaping of genomic architecture (e.g.. Sen et al. 2006; 
Han et al. 2007; Robberecht et al. 2013), often quite literally 
fall between the cracks. 

We developed two, complementary approaches to analyze 
TE diversity and dynamics using short-read sequencing across 
six fungal genomes. We first assembled draft genomes to 
identify TE families and built a reference set of elements for 
annotation of assembled genomes. We then developed a 
method to probe the unassembled portions of our libraries, 
by comparing the relatively different sequencing depths of 
identified TEs and annotated housekeeping genes. Inclusion 
of the coverage-based quantification dramatically increased 
the predicted TE content in many species, underscoring the 
importance of using assembly-free methods to gauge TE 
content. Recently, coverage-based approaches using raw se- 
quencing reads have been used effectively for quantification 
of TEs in plants (Tenaillon et al. 2011; Hertweck 2013; 
Senerchia et al. 2013). In the aggregate, our methods provide 
promising new approaches for extracting information about 
TE distributions from unassembled data. 

In our data, the difference between assembled and unas- 
sembled estimates of TE content was most extreme in 
A. polypyramis, where the proportion of reads aligning into 
TE regions was almost fivefold higher than the proportion of 
assembled bases annotated as TEs (59% and 12%, respec- 
tively). Although the differences between assembled and 
unassembled proportions of TEs were less dramatic in the 
remaining species, our estimates of TE content increased 
across the board when we analyzed unassembled genomes. 
Moreover, the predicted proportion of TEs in the A. muscaria 
JGI assembly doubled, suggesting that the issue of underesti- 
mating TEs may also be relevant for multilibrary assemblies 
that include long insert size paired-end reads. The A. polypyr- 
amis data further underscore that high assembly contiguity is 



not necessarily an indicator of a comprehensive assembly 
(table 2), but in this case may be the result of extensive clus- 
tering, and therefore lack of assembly, of TEs outside of pro- 
tein-coding regions. 

Using a coverage-based approach also mitigates potential 
artifacts from the analysis of a mix of diploid and haploid 
genome sequences. Whether or not homozygous TE inser- 
tions are assembled onto the same or distinct contigs is de- 
pendent on the degree of heterozygosity, which may vary 
among TE families and between genomes. As relative cover- 
age considers the abundance of TE sequences compared with 
reference genes among the complete set of reads, it implicitly 
accounts for the effects of heterozygosity. 

One obvious shortcoming of our approach is its inability 
to detect wholly novel types of TEs as we annotate only 
these sequences commonly recognized as TEs, nor can our 
approach identify TEs that remain completely unassembled. 
The characterization of entirely novel types of TEs may always 
necessitate very high quality genome sequences, where TEs 
can be confidently placed into unique genomic contexts to 
determine their full extent. Other issues include biases result- 
ing from the mapping of highly repetitive regions (Treangen 
and Salzberg 2011) and biases inherent in the sequencing 
protocol, for example, GC bias (Dohm et al. 2008) and PGR 
amplification bias (Aird et al. 201 1). We have addressed map- 
ping biases by analyzing only one hit per sequenced fragment, 
and averaging coverage over TE superfamilies, on the basis 
that superfamilies are sufficiently diverged between each 
other to avoid nonspecific cross mapping. 

Comparison of the final TE content predictions between 
the two A. muscaria assemblies (fig. 2) shows that, although 
our estimates should be considered approximate, we obtain 
proportions that are within 3% of each other by mapping the 
same read data to two entirely independent assemblies gen- 
erated using different sequencing strategies. We believe that 
we are capturing the most important signal in the data, even 
in the assemblies derived from a single lane of lllumina HiSeq 
sequencing. 

TE Content Correlates with Ecology 

A clear signature of TE activity in ECM species is evident in 
both contemporary (fig. 2B) and historical (fig. 3) patterns. The 
three ECM species appear to be at different stages of TE in- 
vasion. Amanita brunnescens and A. polypyramis show signs 
of recent and ongoing TE activity, as manifested by the large 
ratios of unassembled to assembled TE content (fig. 2). The 
data suggest the presence of large numbers of young TE in- 
sertions that are too similar to assemble onto different contigs. 
Recently active families were also suggested by the presence 
of large amplified clades, especially in LINE and Gypsy 
elements (fig. 3). In contrast, A. muscaria houses a more 
modest proportion of TEs. TEs may have proliferated less ex- 
tensively in the A. muscaria genome. However, phylogenetic 
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analyses provide evidence for a number of amplifications in 
A. muscaria (fig. 3), suggesting that /A. muscaria has also ex- 
perienced TE expansions at some point in the past, even if re- 
cent TE activity is less than it is in A. brunnescens or 
A. polypyramis. 

The AS genomes of V. volvacea and A. inopinata demon- 
strate a very different pattern. These genomes encode low 
amounts of TEs, and we found only modest evidence of 
recent activity in either unassembled TE content or TE super- 
family phytogenies. However, the signatures of TE activity 
found in A. thiersii are a stark contrast to A. inopinata and 
v. volvacea. The A. tiniersii genome provided evidence for 
recent amplifications of all three superfamilies and harbored 
TE populations almost three times the size of the V. volvacea 
or A. inopinata genomes (figs. 2 and 3). These data challenge 
the simple association of an ECM niche with higher TE content 
in the Amanita. 

The numbers of TE insertions residing in a genome are de- 
pendent on 1) the rate of transposition and 2) the rate of 
survival of TE copies (Charlesworth B and Charlesworth 
D 1983). A number of ecological and population genetic pro- 
cesses influence rates of transposition and sun/ival. The trans- 
position rate is modulated by regulation of active TE copies. 
Among others, TEs may be activated by stress (Grandbastien 
1998; Capy et al. 2000) or silenced by genome defense mech- 
anisms (Daboussi and Capy 2003). TE survival depends on the 
impact an insertion has on the genome and, if it is deleterious, 
the ability of natural selection to remove it from the popula- 
tion before it is fixed. Small effective population sizes reduce 
the effectiveness of selection, allowing altered rates of fixation 
of deleterious TEs (Charlesworth B and Charlesworth D 1983; 
Lynch and Conery 2003). Demographic events, including pop- 
ulation bottlenecks, may reduce the effective population size, 
resulting in slower rates of TE loss and consequentially 
higher rates of fixation (Gherman et al. 2007; Lockton et al. 
2008). The mating system of the organism will also influence 
TE retention. In theory, the spread of a new TE copy across 
a population of selfing organisms is difficult and unlikely 
(Boutin et al. 2012). But, already established elements may 
be retained more readily, for example because of a potential 
reduction in the negative impact of ectopic recombination 
between dispersed TEs when insertions are homozygous 
(Montgomery et al. 1987; Boutin et al. 2012). Selfing also 
results in a decrease of the effective population size 
(Nordborg 2000). 

Understanding patterns of TE distributions across a phylog- 
eny and differentiating among the processes that drive pat- 
terns requires rich contextual information about species' 
natural histories. Amanita thiersii is currently undergoing a 
range expansion in North America (Wolfe, Kuo, et al. 2012), 
and genetic diversity across its new range is very low, suggest- 
ing that the species is experiencing a population bottleneck 
and has a small effective population size. Data from other 
organisms suggest that this demographic scenario enables 



TE proliferation in Eukaryotes (Lynch and Conery 2003; 
Gherman et al. 2007; Lockton et al. 2008). A population bot- 
tleneck is also expected to similarly effect different classes of 
repeats (Gherman et al. 2007), which is consistent with our 
discovery that all three superfamilies we investigated show 
amplifications in A. thiersii. 

A common narrative to explain TE expansions among the 
ECM species is less obvious. In contrast to the established link 
between pathogenicity effectors and TEs in plant pathogens 
(Sacristan et al. 2009; Rouxel et al. 2011), more evidence 
linking TEs with genes involved in the establishment and main- 
tenance of symbiosis will be required to confirm that TEs 
enable genome flexibility and the symbiotic niche. Whether 
or not common mechanistic processes drive the expansions of 
TEs in ECM species, and if so, whether they are acting on the 
rate of transposition or rate of TE sun/ival also remain to be 
determined. 

Although the ECM Amanita fit patterns described for 
L. bicolor and T. melanosporum (Martin and Selosse 2008; 
Martin et al. 2010; Veneault-Fourrey and Martin 2011), 
there is no simple association between high TE content and 
the ECM niche. TEs directly influence host-specificity genes in 
plant pathogenic fungi (Sacristan et al. 2009; Rouxel et al. 
2011), nonetheless additional forces may also influence in- 
creased TE abundance in plant pathogens. As demonstrated 
by the wide abundance of TEs in A. thiersii, the particular 
natural histories of species may also influence TE distributions. 
For example, among the biotrophic pathogens listed in the 
introduction, most have both sexual and asexual phases in 
their lifecycle (McDonald and Linde 2002; Giraud et al. 
2008), a pattern shown to result in elevated number of TEs 
in cyclically sexual populations of Daphnia pulex (Schaack, 
Choi, et al. 2010; Schaack, Pritham, et al. 2010). A more 
detailed dissection of the different processes influencing TE 
insertion, dispersal, and survival is needed to disentangle the 
causal from the incidental and enable a holistic understanding 
of the adaptive impact of TEs in biotrophic fungi. 

Data Deposition 

Raw sequencing libraries and assemblies for the A. brunnes- 
cens, A. polypyramis, A. muscaria (replicate), A. inopinata, 
and V. volvacea genomes have been deposited at National 
Center for Biotechnology Information (NCBI), BioProject 
numbers PRJNA236753, PRJNA236755, PRJNA236758, 
PRJNA236757, and PRJNA236756. The genome sequences 
of A. muscaria and A. thiersii are available at JGI (http:// 
genome.jgi.doe.gov/programs/fungi/index.jsf, last accessed 
June 17, 2014) and associated data have been deposited at 
NCBI, BioProjects PRJNA207684 and PRJNA82749, respec- 
tively. The sequence alignments of TE families used for phylo- 
genetic analysis are available from the corresponding author 
by request. 
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Supplementary Material 

Supplementary data file SI, figure SI, and tables S1-S8 are 
available at Genome Biology and Evolution online (http:// 
www.gbe.oxfordjournals.org/). 
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